Overview

Dataset statistics

Number of variables13
Number of observations1359
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory138.1 KiB
Average record size in memory104.1 B

Variable types

NUM13

Reproduction

Analysis started2020-05-17 21:38:01.079024
Analysis finished2020-05-17 21:38:34.664235
Duration33.59 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

df_index has unique values Unique
citric acid has 118 (8.7%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count1359
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean792.6931567328918
Minimum0
Maximum1598
Zeros1
Zeros (%)0.1%
Memory size10.6 KiB

Quantile statistics

Minimum0
5-th percentile72.9
Q1388.5
median785
Q31193.5
95-th percentile1521.2
Maximum1598
Range1598
Interquartile range (IQR)805

Descriptive statistics

Standard deviation465.38084
Coefficient of variation (CV)0.5870882523
Kurtosis-1.211617341
Mean792.6931567
Median Absolute Deviation (MAD)404
Skewness0.01947511873
Sum1077270
Variance216579.3262
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
159810.1%
 
52210.1%
 
51310.1%
 
51510.1%
 
51610.1%
 
51710.1%
 
51810.1%
 
51910.1%
 
52010.1%
 
52110.1%
 
Other values (1349)134999.3%
 
ValueCountFrequency (%) 
010.1%
 
110.1%
 
210.1%
 
310.1%
 
510.1%
 
ValueCountFrequency (%) 
159810.1%
 
159710.1%
 
159510.1%
 
159410.1%
 
159310.1%
 

fixed acidity
Real number (ℝ≥0)

Distinct count96
Unique (%)7.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.310596026490067
Minimum4.6
Maximum15.9
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum4.6
5-th percentile6.1
Q17.1
median7.9
Q39.2
95-th percentile11.71
Maximum15.9
Range11.3
Interquartile range (IQR)2.1

Descriptive statistics

Standard deviation1.736989808
Coefficient of variation (CV)0.2090090533
Kurtosis1.049673362
Mean8.310596026
Median Absolute Deviation (MAD)1
Skewness0.9410413665
Sum11294.1
Variance3.017133591
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7.2493.6%
 
7.8483.5%
 
7.1463.4%
 
7443.2%
 
7.5423.1%
 
7.6413.0%
 
7.7402.9%
 
6.8382.8%
 
7.9382.8%
 
8.2372.7%
 
Other values (86)93668.9%
 
ValueCountFrequency (%) 
4.610.1%
 
4.710.1%
 
4.910.1%
 
560.4%
 
5.140.3%
 
ValueCountFrequency (%) 
15.910.1%
 
15.620.1%
 
15.510.1%
 
1510.1%
 
14.310.1%
 

volatile acidity
Real number (ℝ≥0)

Distinct count143
Unique (%)10.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5294775570272259
Minimum0.12
Maximum1.58
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum0.12
5-th percentile0.27
Q10.39
median0.52
Q30.64
95-th percentile0.8505
Maximum1.58
Range1.46
Interquartile range (IQR)0.25

Descriptive statistics

Standard deviation0.1830313176
Coefficient of variation (CV)0.3456828626
Kurtosis1.249243497
Mean0.529477557
Median Absolute Deviation (MAD)0.125
Skewness0.7292789464
Sum719.56
Variance0.03350046323
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.5372.7%
 
0.58362.6%
 
0.4352.6%
 
0.6342.5%
 
0.43332.4%
 
0.38312.3%
 
0.59312.3%
 
0.39312.3%
 
0.49302.2%
 
0.42302.2%
 
Other values (133)103175.9%
 
ValueCountFrequency (%) 
0.1210.1%
 
0.1620.1%
 
0.1870.5%
 
0.1920.1%
 
0.230.2%
 
ValueCountFrequency (%) 
1.5810.1%
 
1.3320.1%
 
1.2410.1%
 
1.18510.1%
 
1.1810.1%
 

citric acid
Real number (ℝ≥0)

ZEROS

Distinct count80
Unique (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2723325974981604
Minimum0.0
Maximum1.0
Zeros118
Zeros (%)8.7%
Memory size10.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.09
median0.26
Q30.43
95-th percentile0.6
Maximum1
Range1
Interquartile range (IQR)0.34

Descriptive statistics

Standard deviation0.1955365446
Coefficient of variation (CV)0.7180063876
Kurtosis-0.7889205005
Mean0.2723325975
Median Absolute Deviation (MAD)0.17
Skewness0.3127255424
Sum370.1
Variance0.03823454025
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01188.7%
 
0.49594.3%
 
0.24413.0%
 
0.02382.8%
 
0.08322.4%
 
0.26302.2%
 
0.1292.1%
 
0.4272.0%
 
0.32261.9%
 
0.31261.9%
 
Other values (70)93368.7%
 
ValueCountFrequency (%) 
01188.7%
 
0.01251.8%
 
0.02382.8%
 
0.03241.8%
 
0.04241.8%
 
ValueCountFrequency (%) 
110.1%
 
0.7910.1%
 
0.7810.1%
 
0.7630.2%
 
0.7510.1%
 

residual sugar
Real number (ℝ≥0)

Distinct count91
Unique (%)6.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.5233995584988964
Minimum0.9
Maximum15.5
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum0.9
5-th percentile1.6
Q11.9
median2.2
Q32.6
95-th percentile4.8
Maximum15.5
Range14.6
Interquartile range (IQR)0.7

Descriptive statistics

Standard deviation1.352313758
Coefficient of variation (CV)0.5359094849
Kurtosis29.36459187
Mean2.523399558
Median Absolute Deviation (MAD)0.3
Skewness4.548153404
Sum3429.3
Variance1.828752499
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
21339.8%
 
2.21108.1%
 
1.81087.9%
 
2.11047.7%
 
1.9977.1%
 
2.3866.3%
 
2.4745.4%
 
2.5745.4%
 
2.6715.2%
 
1.7624.6%
 
Other values (81)44032.4%
 
ValueCountFrequency (%) 
0.910.1%
 
1.270.5%
 
1.350.4%
 
1.4292.1%
 
1.5251.8%
 
ValueCountFrequency (%) 
15.510.1%
 
15.410.1%
 
13.910.1%
 
13.810.1%
 
13.410.1%
 

chlorides
Real number (ℝ≥0)

Distinct count153
Unique (%)11.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.08812362030905077
Minimum0.012
Maximum0.611
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum0.012
5-th percentile0.053
Q10.07
median0.079
Q30.091
95-th percentile0.1376
Maximum0.611
Range0.599
Interquartile range (IQR)0.021

Descriptive statistics

Standard deviation0.04937686244
Coefficient of variation (CV)0.5603135944
Kurtosis38.62465317
Mean0.08812362031
Median Absolute Deviation (MAD)0.01
Skewness5.502487295
Sum119.76
Variance0.002438074545
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.08503.7%
 
0.078443.2%
 
0.074433.2%
 
0.084402.9%
 
0.076392.9%
 
0.079392.9%
 
0.082382.8%
 
0.075372.7%
 
0.077362.6%
 
0.071362.6%
 
Other values (143)95770.4%
 
ValueCountFrequency (%) 
0.01210.1%
 
0.03410.1%
 
0.03820.1%
 
0.03940.3%
 
0.04140.3%
 
ValueCountFrequency (%) 
0.61110.1%
 
0.6110.1%
 
0.46710.1%
 
0.46410.1%
 
0.42210.1%
 

free sulfur dioxide
Real number (ℝ≥0)

Distinct count60
Unique (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.893303899926417
Minimum1.0
Maximum72.0
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum1
5-th percentile4
Q17
median14
Q321
95-th percentile35
Maximum72
Range71
Interquartile range (IQR)14

Descriptive statistics

Standard deviation10.44727026
Coefficient of variation (CV)0.6573378528
Kurtosis1.892690741
Mean15.8933039
Median Absolute Deviation (MAD)7
Skewness1.226579499
Sum21599
Variance109.1454559
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
61218.9%
 
5886.5%
 
15654.8%
 
12644.7%
 
10634.6%
 
7614.5%
 
9554.0%
 
16533.9%
 
17503.7%
 
11493.6%
 
Other values (50)69050.8%
 
ValueCountFrequency (%) 
120.1%
 
210.1%
 
3413.0%
 
4342.5%
 
5886.5%
 
ValueCountFrequency (%) 
7210.1%
 
6810.1%
 
6610.1%
 
5710.1%
 
5510.1%
 

total sulfur dioxide
Real number (ℝ≥0)

Distinct count144
Unique (%)10.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.82597498160412
Minimum6.0
Maximum289.0
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum6
5-th percentile11
Q122
median38
Q363
95-th percentile113
Maximum289
Range283
Interquartile range (IQR)41

Descriptive statistics

Standard deviation33.40894571
Coefficient of variation (CV)0.7134703702
Kurtosis4.042256741
Mean46.82597498
Median Absolute Deviation (MAD)19
Skewness1.540368078
Sum63636.5
Variance1116.157653
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
28352.6%
 
24322.4%
 
14302.2%
 
20292.1%
 
15282.1%
 
18282.1%
 
19272.0%
 
23272.0%
 
12261.9%
 
27251.8%
 
Other values (134)107278.9%
 
ValueCountFrequency (%) 
620.1%
 
740.3%
 
8110.8%
 
9131.0%
 
10231.7%
 
ValueCountFrequency (%) 
28910.1%
 
27810.1%
 
16510.1%
 
16010.1%
 
15510.1%
 

density
Real number (ℝ≥0)

Distinct count436
Unique (%)32.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9967089477557026
Minimum0.99007
Maximum1.00369
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum0.99007
5-th percentile0.993569
Q10.9956
median0.9967
Q30.99782
95-th percentile0.9998
Maximum1.00369
Range0.01362
Interquartile range (IQR)0.00222

Descriptive statistics

Standard deviation0.001868917133
Coefficient of variation (CV)0.001875088146
Kurtosis0.8306587623
Mean0.9967089478
Median Absolute Deviation (MAD)0.0011
Skewness0.04477785573
Sum1354.52746
Variance3.492851248e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.9968332.4%
 
0.9976302.2%
 
0.9972292.1%
 
0.998282.1%
 
0.9962231.7%
 
0.9964221.6%
 
0.9978221.6%
 
0.9982211.5%
 
0.997211.5%
 
0.9966201.5%
 
Other values (426)111081.7%
 
ValueCountFrequency (%) 
0.9900710.1%
 
0.990210.1%
 
0.9906410.1%
 
0.990810.1%
 
0.9908410.1%
 
ValueCountFrequency (%) 
1.0036910.1%
 
1.003210.1%
 
1.0031520.1%
 
1.0028910.1%
 
1.002620.1%
 

pH
Real number (ℝ≥0)

Distinct count89
Unique (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.309786607799853
Minimum2.74
Maximum4.01
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum2.74
5-th percentile3.06
Q13.21
median3.31
Q33.4
95-th percentile3.57
Maximum4.01
Range1.27
Interquartile range (IQR)0.19

Descriptive statistics

Standard deviation0.1550363113
Coefficient of variation (CV)0.04684178458
Kurtosis0.8797897393
Mean3.309786608
Median Absolute Deviation (MAD)0.1
Skewness0.2320322752
Sum4498
Variance0.02403625782
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.3473.5%
 
3.26453.3%
 
3.36423.1%
 
3.38413.0%
 
3.32402.9%
 
3.34402.9%
 
3.39392.9%
 
3.31372.7%
 
3.28372.7%
 
3.22352.6%
 
Other values (79)95670.3%
 
ValueCountFrequency (%) 
2.7410.1%
 
2.8610.1%
 
2.8710.1%
 
2.8820.1%
 
2.8920.1%
 
ValueCountFrequency (%) 
4.0120.1%
 
3.920.1%
 
3.8510.1%
 
3.7820.1%
 
3.7510.1%
 

sulphates
Real number (ℝ≥0)

Distinct count96
Unique (%)7.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6587049300956587
Minimum0.33
Maximum2.0
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum0.33
5-th percentile0.47
Q10.55
median0.62
Q30.73
95-th percentile0.94
Maximum2
Range1.67
Interquartile range (IQR)0.18

Descriptive statistics

Standard deviation0.1706668906
Coefficient of variation (CV)0.2590946003
Kurtosis11.10228226
Mean0.6587049301
Median Absolute Deviation (MAD)0.08
Skewness2.406504615
Sum895.18
Variance0.02912718754
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.54584.3%
 
0.58574.2%
 
0.6574.2%
 
0.62533.9%
 
0.56523.8%
 
0.57483.5%
 
0.53463.4%
 
0.59443.2%
 
0.55423.1%
 
0.61413.0%
 
Other values (86)86163.4%
 
ValueCountFrequency (%) 
0.3310.1%
 
0.3720.1%
 
0.3930.2%
 
0.430.2%
 
0.4240.3%
 
ValueCountFrequency (%) 
210.1%
 
1.9810.1%
 
1.9510.1%
 
1.6210.1%
 
1.6110.1%
 

alcohol
Real number (ℝ≥0)

Distinct count65
Unique (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.432315428010792
Minimum8.4
Maximum14.9
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum8.4
5-th percentile9.2
Q19.5
median10.2
Q311.1
95-th percentile12.5
Maximum14.9
Range6.5
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation1.08206545
Coefficient of variation (CV)0.1037224629
Kurtosis0.1597388547
Mean10.43231543
Median Absolute Deviation (MAD)0.7
Skewness0.8598411692
Sum14177.51667
Variance1.170865638
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9.51118.2%
 
9.4916.7%
 
9.2654.8%
 
9.8634.6%
 
10614.5%
 
9.3564.1%
 
10.5533.9%
 
9.6493.6%
 
9.7473.5%
 
11463.4%
 
Other values (55)71752.8%
 
ValueCountFrequency (%) 
8.420.1%
 
8.510.1%
 
8.720.1%
 
8.810.1%
 
9211.5%
 
ValueCountFrequency (%) 
14.910.1%
 
1460.4%
 
13.640.3%
 
13.5666666710.1%
 
13.510.1%
 

quality
Real number (ℝ≥0)

Distinct count6
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6232523914643116
Minimum3
Maximum8
Zeros0
Zeros (%)0.0%
Memory size10.6 KiB

Quantile statistics

Minimum3
5-th percentile5
Q15
median6
Q36
95-th percentile7
Maximum8
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8235780017
Coefficient of variation (CV)0.1464593698
Kurtosis0.3402560881
Mean5.623252391
Median Absolute Deviation (MAD)1
Skewness0.1924065873
Sum7642
Variance0.6782807249
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
557742.5%
 
653539.4%
 
716712.3%
 
4533.9%
 
8171.3%
 
3100.7%
 
ValueCountFrequency (%) 
3100.7%
 
4533.9%
 
557742.5%
 
653539.4%
 
716712.3%
 
ValueCountFrequency (%) 
8171.3%
 
716712.3%
 
653539.4%
 
557742.5%
 
4533.9%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexfixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
007.40.700.001.90.07611.034.00.99783.510.569.45
117.80.880.002.60.09825.067.00.99683.200.689.85
227.80.760.042.30.09215.054.00.99703.260.659.85
3311.20.280.561.90.07517.060.00.99803.160.589.86
457.40.660.001.80.07513.040.00.99783.510.569.45
567.90.600.061.60.06915.059.00.99643.300.469.45
677.30.650.001.20.06515.021.00.99463.390.4710.07
787.80.580.022.00.0739.018.00.99683.360.579.57
897.50.500.366.10.07117.0102.00.99783.350.8010.55
9106.70.580.081.80.09715.065.00.99593.280.549.25

Last rows

df_indexfixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
134915887.20.6600.332.50.06834.0102.00.994143.270.7812.86
135015896.60.7250.207.80.07329.079.00.997703.290.549.25
135115906.30.5500.151.80.07726.035.00.993143.320.8211.66
135215915.40.7400.091.70.08916.026.00.994023.670.5611.66
135315926.30.5100.132.30.07629.040.00.995743.420.7511.06
135415936.80.6200.081.90.06828.038.00.996513.420.829.56
135515946.20.6000.082.00.09032.044.00.994903.450.5810.55
135615955.90.5500.102.20.06239.051.00.995123.520.7611.26
135715975.90.6450.122.00.07532.044.00.995473.570.7110.25
135815986.00.3100.473.60.06718.042.00.995493.390.6611.06